Pretrial Detention In Virginia¶

Influence of Race and Geography¶

A Bayesian Analysis¶

by Brandtly Jones and Eric Tang

Introduction¶

Every year, well over 300,000 individuals are arrested in Virginia for offenses big and small. What happens next can vary widely. Most are released quickly on their own recognizance or with a bond posted to ensure that they return for their court date, but others face pretrial detention, sometimes all the way to trial even before being convicted of a crime.

This analysis will look at the role that race and geography (i.e. judicial circuit) play in determining decisions about pretrial release. While controling for factors that contribute to release conditions, we will seek to detect if racial disparities are evident in the available data concerning pretrial release and if those disparities vary across judicial district. Using partial pooling and hierarchical modeling we intend to evaluate posterior distributions on race and geography as predictors of pretrial treatment of individuals, and thereby quantify the uncertainty in the model.

The data set contains approximately 22,000 de-identified individuals with contact events with the criminal justice system in October of 2017. The data contains over 700 variables, including demographic information about the defendants as well as their criminal history, locality, offense, lawyer type (e.g. public defender or private attorney), and their pretrial outcomes. We have reduced these to a manageable set of predictors and the binary response variable of whether a defendant was held for their entire pretrial period.

In [5]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import arviz as az
In [2]:
#initial paring down of full dataset
# df=pd.read_csv("http://www.vcsc.virginia.gov/pretrialdataproject/October%202017%20Cohort_Virginia%20Pretrial%20Data%20Project_Deidentified%20FINAL%20Update_10272021.csv", low_memory=False)
In [49]:
"""
cols=df.columns
selected_cols=[item[1] for item in
    [(2, 'Defendant_Sex'),
    (3, 'Defendant_Race'),
    (5, 'Defendant_Age'),
    (9, 'Defendant_IndigencyStatus'),
    (12, 'WhetherDefendantWasReleasedPretrial'),
    (14, 'DaysBetweenContactEventAndPretrialRelease'),
    (16, 'PretrialReleaseType2'),
    (17, 'BondTypeAtInitialContact'),
    (18, 'BondAmountAtInitialContact'),
    (19, 'BondTypeAtRelease_v1'),
    (21, 'BondAmountAtRelease'),
    (25, 'Indicator_PresumptiveDenialOfBail_19.2_120'),
    (168, 'PriorArrests'),
    (301, 'Locality_JudicialCircuit'),
    (303, 'Locality_MagisterialRegion'),
    (563, 'FollowUp_ChargedWithNewFTA'),
    (572, 'FollowUp_ArrestedforNewOff'),
    (708, 'CrimeCommission2021ReportClassificationofDefendants')]]+['VPRAI_TotalPoints_Opt1',
     'VPRAI_TotalPoints_Opt2',
     'PSA_FTA_TotalPoints',
     'PSA_NCA_TotalPoints',
     'PSA_NVCA_TotalPoints']
     """
Out[49]:
"\ncols=df.columns\nselected_cols=[item[1] for item in\n    [(2, 'Defendant_Sex'),\n    (3, 'Defendant_Race'),\n    (5, 'Defendant_Age'),\n    (9, 'Defendant_IndigencyStatus'),\n    (12, 'WhetherDefendantWasReleasedPretrial'),\n    (14, 'DaysBetweenContactEventAndPretrialRelease'),\n    (16, 'PretrialReleaseType2'),\n    (17, 'BondTypeAtInitialContact'),\n    (18, 'BondAmountAtInitialContact'),\n    (19, 'BondTypeAtRelease_v1'),\n    (21, 'BondAmountAtRelease'),\n    (25, 'Indicator_PresumptiveDenialOfBail_19.2_120'),\n    (168, 'PriorArrests'),\n    (301, 'Locality_JudicialCircuit'),\n    (303, 'Locality_MagisterialRegion'),\n    (563, 'FollowUp_ChargedWithNewFTA'),\n    (572, 'FollowUp_ArrestedforNewOff'),\n    (708, 'CrimeCommission2021ReportClassificationofDefendants')]]+['VPRAI_TotalPoints_Opt1',\n     'VPRAI_TotalPoints_Opt2',\n     'PSA_FTA_TotalPoints',\n     'PSA_NCA_TotalPoints',\n     'PSA_NVCA_TotalPoints']\n     "
In [4]:
# df[selected_cols].to_csv("SelectPretrialData.csv", index=False)
In [2]:
df=pd.read_csv("SelectPretrialData.csv")

Cleaning¶

The overwhelming majority of defendants are identified as white or black, so we will focus just on these.

Pretrial release is coded 0, 1, or 9 for unclear. There are only 30 unknowns, or .13% which won't affect the analysis so these are dropped.

13 records do not include a Judicial Circuit so these are dropped.

In [3]:
cols=['Defendant_Race', 'WhetherDefendantWasReleasedPretrial', 'Locality_JudicialCircuit']
minimal_df=df[cols].query("((Defendant_Race=='W')|(Defendant_Race=='B'))")
minimal_df=minimal_df.query("Locality_JudicialCircuit!=' '")
minimal_df=minimal_df.query("WhetherDefendantWasReleasedPretrial!=9")

Some overview of the racial breakdown of the data¶

In [105]:
#base rate of pre-trial release for all defendants across Virginia
ax = minimal_df['WhetherDefendantWasReleasedPretrial'].replace({1:"Released",0:"Held Pretrial"}).value_counts(normalize=True).plot(kind='bar')
for p in ax.patches:
    ax.annotate(str(round(p.get_height(),2)), (p.get_x() * 1.05, p.get_height() * 1.005))
plt.title("Proportion of defendants released pretrial")
plt.xticks(rotation=0)
plt.savefig("Proportion of defendants released pretrial.png")
plt.show()
In [107]:
bw=pd.crosstab(minimal_df['Defendant_Race'],1-minimal_df['WhetherDefendantWasReleasedPretrial'], normalize='index')
bw
Out[107]:
WhetherDefendantWasReleasedPretrial 0 1
Defendant_Race
B 0.811538 0.188462
W 0.846265 0.153735
In [108]:
ax=bw.plot.bar(stacked=True)
for p in ax.patches:
    if p.get_height()>.8:
        ax.annotate(str(round(p.get_height(),2)), (p.get_x()+.2, p.get_height() * 1.005))

plt.legend(['Released', 'Held Pretrial'],loc='lower left')
plt.title("White defendants are released at a higher rate than Black defendants")
plt.savefig("White defendants released.png");
In [109]:
dist_bw=minimal_df.groupby(["Locality_JudicialCircuit",'Defendant_Race']).agg({'WhetherDefendantWasReleasedPretrial':'mean'})
dist_bw=dist_bw.reset_index()
dist_bw_pivot=dist_bw.pivot_table(columns=['Defendant_Race'], index=['Locality_JudicialCircuit'])
dist_bw_pivot['Difference']=100*(dist_bw_pivot['WhetherDefendantWasReleasedPretrial']['W']-dist_bw_pivot['WhetherDefendantWasReleasedPretrial']['B'])
fig, ax=plt.subplots(figsize=(10,10))
plt.stem(dist_bw_pivot.index, dist_bw_pivot['Difference'])
plt.xticks(np.arange(1,32),rotation=70)
plt.ylabel("Difference in release rates")
plt.xlabel("Judicial District")
plt.suptitle("White Defendants are released at a higher rate in 24 of 31 Judicial Districts")
plt.title("Stems above the line indicate a higher rate of release for White defendants")
plt.savefig("District differences.png");

So just looking at proportions we have an intuition that there may be variation in treatment of defendants based on race in the overall population, and that we may have some districts where that difference is more pronounced.

To investigate this phenomenon we will apply a hierarchical model to the data.

In [7]:
import pymc as pm
import arviz as az
az.style.use('arviz-darkgrid')
SEED = 20190518 # from random.org, for reproducibility
np.random.seed(SEED)
import scipy as sp
In [8]:
minimal_df['Indicator_Black']=(minimal_df['Defendant_Race']=='B').astype(int)
jud_districts = minimal_df.Locality_JudicialCircuit.unique()
districts = len(jud_districts) 
district = minimal_df['Locality_JudicialCircuit'].values.astype(float).astype(int)-1
np.unique(district)
Out[8]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])
In [13]:
with pm.Model() as partial_pooling:

    # Priors
    mu_a = pm.Normal('mu_a', mu=0., sigma=1e5)
    sigma_a = pm.HalfCauchy('sigma_a', 5)

    # Random intercepts
    a = pm.Normal('a', mu=mu_a, sigma=sigma_a, shape=districts)

    # Model error
    sigma_y = pm.HalfCauchy('sigma_y',5)

    # Expected value
    y_hat = a[district]

    # Data likelihood
    y_like = pm.Normal('y_like', mu=y_hat, sigma=sigma_y, observed=minimal_df['WhetherDefendantWasReleasedPretrial'])

pm.model_to_graphviz(partial_pooling)
Out[13]:
<svg width="310pt" height="358pt" viewBox="0.00 0.00 310.00 358.00" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 354)"> <title>%3</title> <polygon fill="white" stroke="white" points="-4,5 -4,-354 307,-354 307,5 -4,5"/> <g id="clust1" class="cluster"><title>cluster31</title> <path fill="none" stroke="black" d="M143,-131C143,-131 219,-131 219,-131 225,-131 231,-137 231,-143 231,-143 231,-234 231,-234 231,-240 225,-246 219,-246 219,-246 143,-246 143,-246 137,-246 131,-240 131,-234 131,-234 131,-143 131,-143 131,-137 137,-131 143,-131"/> <text text-anchor="middle" x="215" y="-138.8" font-family="Times,serif" font-size="14.00">31</text> </g> <g id="clust2" class="cluster"><title>cluster22463</title> <path fill="none" stroke="black" d="M82,-8C82,-8 158,-8 158,-8 164,-8 170,-14 170,-20 170,-20 170,-111 170,-111 170,-117 164,-123 158,-123 158,-123 82,-123 82,-123 76,-123 70,-117 70,-111 70,-111 70,-20 70,-20 70,-14 76,-8 82,-8"/> <text text-anchor="middle" x="143" y="-15.8" font-family="Times,serif" font-size="14.00">22463</text> </g> <g id="node1" class="node"><title>sigma_y</title> <ellipse fill="none" stroke="black" cx="60" cy="-200" rx="60.2186" ry="37.4533"/> <text text-anchor="middle" x="60" y="-211.3" font-family="Times,serif" font-size="14.00">sigma_y</text> <text text-anchor="middle" x="60" y="-196.3" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="60" y="-181.3" font-family="Times,serif" font-size="14.00">HalfCauchy</text> </g> <g id="node5" class="node"><title>y_like</title> <ellipse fill="lightgrey" stroke="black" cx="120" cy="-77" rx="42.1401" ry="37.4533"/> <text text-anchor="middle" x="120" y="-88.3" font-family="Times,serif" font-size="14.00">y_like</text> <text text-anchor="middle" x="120" y="-73.3" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="120" y="-58.3" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge3" class="edge"><title>sigma_y->y_like</title> <path fill="none" stroke="black" d="M77.3545,-164.002C84.0548,-150.49 91.7851,-134.9 98.8086,-120.736"/> <polygon fill="black" stroke="black" points="101.997,-122.184 103.304,-111.67 95.7259,-119.074 101.997,-122.184"/> </g> <g id="node2" class="node"><title>mu_a</title> <ellipse fill="none" stroke="black" cx="121" cy="-312" rx="42.1401" ry="37.4533"/> <text text-anchor="middle" x="121" y="-323.3" font-family="Times,serif" font-size="14.00">mu_a</text> <text text-anchor="middle" x="121" y="-308.3" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="121" y="-293.3" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="node4" class="node"><title>a</title> <ellipse fill="none" stroke="black" cx="181" cy="-200" rx="42.1401" ry="37.4533"/> <text text-anchor="middle" x="181" y="-211.3" font-family="Times,serif" font-size="14.00">a</text> <text text-anchor="middle" x="181" y="-196.3" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="181" y="-181.3" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge1" class="edge"><title>mu_a->a</title> <path fill="none" stroke="black" d="M139.002,-277.996C144.994,-267.011 151.751,-254.623 158.065,-243.048"/> <polygon fill="black" stroke="black" points="161.295,-244.435 163.011,-233.98 155.15,-241.083 161.295,-244.435"/> </g> <g id="node3" class="node"><title>sigma_a</title> <ellipse fill="none" stroke="black" cx="242" cy="-312" rx="60.2186" ry="37.4533"/> <text text-anchor="middle" x="242" y="-323.3" font-family="Times,serif" font-size="14.00">sigma_a</text> <text text-anchor="middle" x="242" y="-308.3" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="242" y="-293.3" font-family="Times,serif" font-size="14.00">HalfCauchy</text> </g> <g id="edge2" class="edge"><title>sigma_a->a</title> <path fill="none" stroke="black" d="M222.699,-276.195C216.864,-265.672 210.393,-254.004 204.324,-243.06"/> <polygon fill="black" stroke="black" points="207.254,-241.126 199.343,-234.078 201.132,-244.521 207.254,-241.126"/> </g> <g id="edge4" class="edge"><title>a->y_like</title> <path fill="none" stroke="black" d="M164.17,-165.616C157.161,-151.713 148.938,-135.401 141.499,-120.646"/> <polygon fill="black" stroke="black" points="144.614,-119.049 136.987,-111.695 138.363,-122.2 144.614,-119.049"/> </g> </g> </svg>
In [14]:
with partial_pooling:
    advi_part_pool=pm.fit(40000, method = 'advi', random_seed = SEED)
100.00% [40000/40000 00:24<00:00 Average Loss = 9,725.4]
Finished [100%]: Average Loss = 9,725.4
In [15]:
advi_elbo = pd.DataFrame(
    {'ELBO': -advi_part_pool.hist,
     'n': np.arange(advi_part_pool.hist.shape[0])})

_=sns.lineplot(y='ELBO', x='n', data=advi_elbo)

We see really strong convergence with ADVI.

In [16]:
az.summary(advi_part_pool.sample(), round_to=2)
arviz - WARNING - Shape validation failed: input_shape: (1, 500), minimum_shape: (chains=2, draws=4)
Out[16]:
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
mu_a 0.83 0.01 0.81 0.84 0.0 0.0 595.09 535.73 NaN
a[0] 0.77 0.02 0.74 0.80 0.0 0.0 484.71 474.04 NaN
a[1] 0.82 0.01 0.80 0.84 0.0 0.0 564.98 512.95 NaN
a[2] 0.80 0.02 0.77 0.84 0.0 0.0 516.49 498.67 NaN
a[3] 0.82 0.01 0.80 0.84 0.0 0.0 396.68 317.58 NaN
a[4] 0.83 0.02 0.79 0.86 0.0 0.0 456.14 504.56 NaN
a[5] 0.80 0.02 0.77 0.83 0.0 0.0 562.35 558.68 NaN
a[6] 0.85 0.02 0.82 0.88 0.0 0.0 437.14 438.47 NaN
a[7] 0.84 0.02 0.81 0.88 0.0 0.0 429.89 542.44 NaN
a[8] 0.84 0.02 0.81 0.87 0.0 0.0 518.96 407.46 NaN
a[9] 0.81 0.02 0.78 0.84 0.0 0.0 491.43 555.18 NaN
a[10] 0.81 0.02 0.78 0.84 0.0 0.0 334.00 425.00 NaN
a[11] 0.82 0.01 0.80 0.85 0.0 0.0 439.21 438.47 NaN
a[12] 0.80 0.01 0.78 0.82 0.0 0.0 570.34 510.31 NaN
a[13] 0.80 0.01 0.78 0.83 0.0 0.0 530.56 424.32 NaN
a[14] 0.80 0.01 0.78 0.82 0.0 0.0 417.19 384.56 NaN
a[15] 0.86 0.01 0.83 0.88 0.0 0.0 350.51 501.06 NaN
a[16] 0.83 0.02 0.79 0.86 0.0 0.0 515.50 393.51 NaN
a[17] 0.84 0.02 0.81 0.88 0.0 0.0 396.42 350.13 NaN
a[18] 0.92 0.01 0.90 0.94 0.0 0.0 470.65 458.50 NaN
a[19] 0.88 0.01 0.86 0.91 0.0 0.0 407.33 463.33 NaN
a[20] 0.78 0.02 0.74 0.82 0.0 0.0 413.31 555.18 NaN
a[21] 0.82 0.01 0.79 0.85 0.0 0.0 464.97 408.34 NaN
a[22] 0.83 0.01 0.80 0.85 0.0 0.0 464.00 438.95 NaN
a[23] 0.84 0.02 0.81 0.87 0.0 0.0 425.82 423.87 NaN
a[24] 0.85 0.01 0.83 0.88 0.0 0.0 405.93 416.00 NaN
a[25] 0.81 0.01 0.79 0.84 0.0 0.0 564.01 472.24 NaN
a[26] 0.85 0.01 0.83 0.88 0.0 0.0 505.30 450.03 NaN
a[27] 0.81 0.02 0.78 0.84 0.0 0.0 550.73 461.64 NaN
a[28] 0.79 0.02 0.75 0.82 0.0 0.0 501.34 463.33 NaN
a[29] 0.80 0.02 0.77 0.84 0.0 0.0 486.76 473.41 NaN
a[30] 0.88 0.01 0.85 0.90 0.0 0.0 481.79 421.32 NaN
sigma_a 0.04 0.01 0.03 0.05 0.0 0.0 459.43 461.64 NaN
sigma_y 0.37 0.00 0.37 0.38 0.0 0.0 500.34 473.99 NaN
In [17]:
az.plot_forest(advi_part_pool.sample());
In [18]:
az.plot_trace(advi_part_pool.sample(10000));
In [19]:
sample_trace = advi_part_pool.sample().posterior['a']
_, sample, districts=sample_trace.shape
jitter = np.random.normal(scale=0.01, size=districts)

means = sample_trace.mean(axis=1)[0,:]
sd = sample_trace.std(axis=1)[0,:]
fig, ax=plt.subplots()
ax.scatter(np.arange(31), means)
#ax.set_xscale('log')
ax.set_xlim(-0.9,32)
ax.set_ylim(0.7, 0.95)
ax.hlines(sample_trace.mean(), -0.9, 32, linestyles='--')
ax.set_ylabel("Proportion of pretrial release")
ax.set_xlabel("Judicial District")
plt.suptitle("Estimated chance of pretrial release by judicial district, using partial pooling")

for j,n,m,s in zip(jitter, np.arange(31), means, sd):
    ax.plot([n]*2, [m-s, m+s], 'b-')
In [20]:
dist_only=minimal_df.groupby('Locality_JudicialCircuit').agg({'WhetherDefendantWasReleasedPretrial':'describe'})
In [21]:
mean=minimal_df['WhetherDefendantWasReleasedPretrial'].mean()
means=dist_only['WhetherDefendantWasReleasedPretrial']['mean'].values
sd=dist_only['WhetherDefendantWasReleasedPretrial']['std'].values
fig, ax=plt.subplots()
ax.scatter(np.arange(31), means)
#ax.set_xscale('log')
ax.set_xlim(-0.9,32)
ax.set_ylim(0.3, 1.3)
ax.hlines(sample_trace.mean(), -0.9, 32, linestyles='--')
ax.set_ylabel("Pretrial release rate")
ax.set_xlabel("Judicial District")
plt.title("Proportion of pretrial release by judicial district")
plt.suptitle("Frequentist Version")
for j,n,m,s in zip(jitter, np.arange(31), means, sd):
    ax.plot([n]*2, [m-s, m+s], 'b-')

Varying intercept and slope model¶

In [9]:
with pm.Model() as varying_intercept_slope:
    # Priors
    mu_a = pm.Normal('mu_a', mu=0., sigma=1e5)
    sigma_a = pm.Exponential("sigma_a", 0.5)
    mu_b = pm.Normal('mu_b', mu=0., sigma=1e5)
    sigma_b = pm.HalfCauchy('sigma_b', 5)
    
    #sigma_b = pm.Exponential("sigma_b", .05)
    # Random intercepts
    a = pm.Normal('a', mu=mu_a, sigma=sigma_a, shape=districts)
    # Random slopes
    b = pm.Normal('b', mu=mu_b, sigma=sigma_b, shape=districts)
    # Model error
    sigma_y = pm.Uniform('sigma_y', lower=0, upper=100)
    # Expected value
    y_hat = a[district] + b[district] * minimal_df['Indicator_Black'].values
    # Data likelihood
    y_like = pm.Normal('y_like', mu=y_hat, sigma=sigma_y, observed=minimal_df['WhetherDefendantWasReleasedPretrial'].values)
    
    
    step = pm.NUTS(target_accept=0.9)
    varying_intercept_slope_trace=pm.sample(8000, step = step, random_seed = SEED)
In [47]:
    
pm.model_to_graphviz(varying_intercept_slope)
Out[47]:
<svg width="461pt" height="355pt" viewBox="0.00 0.00 460.97 354.86" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 350.86)"> <polygon fill="white" stroke="none" points="-4,4 -4,-350.86 456.97,-350.86 456.97,4 -4,4"/> <g id="clust1" class="cluster"> <title>cluster31</title> <path fill="none" stroke="black" d="M137.98,-129.95C137.98,-129.95 314.98,-129.95 314.98,-129.95 320.98,-129.95 326.98,-135.95 326.98,-141.95 326.98,-141.95 326.98,-231.91 326.98,-231.91 326.98,-237.91 320.98,-243.91 314.98,-243.91 314.98,-243.91 137.98,-243.91 137.98,-243.91 131.98,-243.91 125.98,-237.91 125.98,-231.91 125.98,-231.91 125.98,-141.95 125.98,-141.95 125.98,-135.95 131.98,-129.95 137.98,-129.95"/> <text text-anchor="middle" x="311.98" y="-137.75" font-family="Times,serif" font-size="14.00">31</text> </g> <g id="clust2" class="cluster"> <title>cluster22463</title> <path fill="none" stroke="black" d="M137.98,-8C137.98,-8 213.98,-8 213.98,-8 219.98,-8 225.98,-14 225.98,-20 225.98,-20 225.98,-109.95 225.98,-109.95 225.98,-115.95 219.98,-121.95 213.98,-121.95 213.98,-121.95 137.98,-121.95 137.98,-121.95 131.98,-121.95 125.98,-115.95 125.98,-109.95 125.98,-109.95 125.98,-20 125.98,-20 125.98,-14 131.98,-8 137.98,-8"/> <text text-anchor="middle" x="200.98" y="-15.8" font-family="Times,serif" font-size="14.00">22463</text> </g> <g id="node1" class="node"> <title>sigma_y</title> <ellipse fill="none" stroke="black" cx="70.98" cy="-198.43" rx="45.01" ry="37.45"/> <text text-anchor="middle" x="70.98" y="-209.73" font-family="Times,serif" font-size="14.00">sigma_y</text> <text text-anchor="middle" x="70.98" y="-194.73" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="70.98" y="-179.73" font-family="Times,serif" font-size="14.00">Uniform</text> </g> <g id="node8" class="node"> <title>y_like</title> <ellipse fill="lightgrey" stroke="black" cx="175.98" cy="-76.48" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="175.98" y="-87.78" font-family="Times,serif" font-size="14.00">y_like</text> <text text-anchor="middle" x="175.98" y="-72.78" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="175.98" y="-57.78" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge5" class="edge"> <title>sigma_y->y_like</title> <path fill="none" stroke="black" d="M93.38,-165.74C101.96,-154.17 112.07,-141.18 121.98,-129.95 127.58,-123.62 133.8,-117.13 139.96,-110.98"/> <polygon fill="black" stroke="black" points="142.45,-113.44 147.14,-103.93 137.55,-108.44 142.45,-113.44"/> </g> <g id="node2" class="node"> <title>mu_a</title> <ellipse fill="none" stroke="black" cx="276.98" cy="-309.38" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="276.98" y="-320.68" font-family="Times,serif" font-size="14.00">mu_a</text> <text text-anchor="middle" x="276.98" y="-305.68" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="276.98" y="-290.68" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="node7" class="node"> <title>a</title> <ellipse fill="none" stroke="black" cx="276.98" cy="-198.43" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="276.98" y="-209.73" font-family="Times,serif" font-size="14.00">a</text> <text text-anchor="middle" x="276.98" y="-194.73" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="276.98" y="-179.73" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge2" class="edge"> <title>mu_a->a</title> <path fill="none" stroke="black" d="M276.98,-271.8C276.98,-263.63 276.98,-254.85 276.98,-246.32"/> <polygon fill="black" stroke="black" points="280.48,-246.1 276.98,-236.1 273.48,-246.1 280.48,-246.1"/> </g> <g id="node3" class="node"> <title>sigma_b</title> <ellipse fill="none" stroke="black" cx="57.98" cy="-309.38" rx="57.97" ry="37.45"/> <text text-anchor="middle" x="57.98" y="-320.68" font-family="Times,serif" font-size="14.00">sigma_b</text> <text text-anchor="middle" x="57.98" y="-305.68" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="57.98" y="-290.68" font-family="Times,serif" font-size="14.00">HalfCauchy</text> </g> <g id="node6" class="node"> <title>b</title> <ellipse fill="none" stroke="black" cx="175.98" cy="-198.43" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="175.98" y="-209.73" font-family="Times,serif" font-size="14.00">b</text> <text text-anchor="middle" x="175.98" y="-194.73" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="175.98" y="-179.73" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge4" class="edge"> <title>sigma_b->b</title> <path fill="none" stroke="black" d="M90.54,-278.32C105.78,-264.25 123.99,-247.43 139.68,-232.95"/> <polygon fill="black" stroke="black" points="142.18,-235.41 147.15,-226.05 137.43,-230.26 142.18,-235.41"/> </g> <g id="node4" class="node"> <title>mu_b</title> <ellipse fill="none" stroke="black" cx="175.98" cy="-309.38" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="175.98" y="-320.68" font-family="Times,serif" font-size="14.00">mu_b</text> <text text-anchor="middle" x="175.98" y="-305.68" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="175.98" y="-290.68" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge3" class="edge"> <title>mu_b->b</title> <path fill="none" stroke="black" d="M175.98,-271.8C175.98,-263.63 175.98,-254.85 175.98,-246.32"/> <polygon fill="black" stroke="black" points="179.48,-246.1 175.98,-236.1 172.48,-246.1 179.48,-246.1"/> </g> <g id="node5" class="node"> <title>sigma_a</title> <ellipse fill="none" stroke="black" cx="394.98" cy="-309.38" rx="57.97" ry="37.45"/> <text text-anchor="middle" x="394.98" y="-320.68" font-family="Times,serif" font-size="14.00">sigma_a</text> <text text-anchor="middle" x="394.98" y="-305.68" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="394.98" y="-290.68" font-family="Times,serif" font-size="14.00">Exponential</text> </g> <g id="edge1" class="edge"> <title>sigma_a->a</title> <path fill="none" stroke="black" d="M362.43,-278.32C347.18,-264.25 328.97,-247.43 313.29,-232.95"/> <polygon fill="black" stroke="black" points="315.53,-230.26 305.81,-226.05 310.79,-235.41 315.53,-230.26"/> </g> <g id="edge6" class="edge"> <title>b->y_like</title> <path fill="none" stroke="black" d="M175.98,-160.79C175.98,-149.38 175.98,-136.65 175.98,-124.63"/> <polygon fill="black" stroke="black" points="179.48,-124.31 175.98,-114.31 172.48,-124.31 179.48,-124.31"/> </g> <g id="edge7" class="edge"> <title>a->y_like</title> <path fill="none" stroke="black" d="M255.17,-166.2C246.7,-154.55 236.71,-141.4 226.98,-129.95 221.89,-123.96 216.28,-117.8 210.7,-111.91"/> <polygon fill="black" stroke="black" points="213.14,-109.39 203.69,-104.61 208.1,-114.24 213.14,-109.39"/> </g> </g> </svg>
In [16]:
az.plot_forest(varying_intercept_slope_trace);
In [12]:
with varying_intercept_slope:
    pm.plot_trace(varying_intercept_slope_trace)
In [46]:
varying_intercept_slope_trace
Out[46]:
arviz.InferenceData
    • <xarray.Dataset>
      Dimensions:  (chain: 4, draw: 8000, a_dim_0: 31, b_dim_0: 31)
      Coordinates:
        * chain    (chain) int64 0 1 2 3
        * draw     (draw) int64 0 1 2 3 4 5 6 7 ... 7993 7994 7995 7996 7997 7998 7999
        * a_dim_0  (a_dim_0) int64 0 1 2 3 4 5 6 7 8 9 ... 22 23 24 25 26 27 28 29 30
        * b_dim_0  (b_dim_0) int64 0 1 2 3 4 5 6 7 8 9 ... 22 23 24 25 26 27 28 29 30
      Data variables:
          mu_a     (chain, draw) float64 0.8377 0.8338 0.839 ... 0.8477 0.8388 0.8401
          mu_b     (chain, draw) float64 -0.03049 -0.04328 ... -0.03256 -0.03794
          a        (chain, draw, a_dim_0) float64 0.796 0.819 0.85 ... 0.7885 0.8923
          b        (chain, draw, b_dim_0) float64 -0.04281 -0.03986 ... 0.0009009
          sigma_a  (chain, draw) float64 0.03069 0.03164 0.03194 ... 0.04317 0.03777
          sigma_b  (chain, draw) float64 0.0398 0.03874 0.02839 ... 0.03045 0.05148
          sigma_y  (chain, draw) float64 0.3748 0.3743 0.372 ... 0.369 0.3745 0.3686
      Attributes:
          created_at:                 2022-12-06T17:59:44.518942
          arviz_version:              0.12.1
          inference_library:          pymc
          inference_library_version:  4.2.2
          sampling_time:              213.41605377197266
          tuning_steps:               1000
      xarray.Dataset
        • chain: 4
        • draw: 8000
        • a_dim_0: 31
        • b_dim_0: 31
        • chain
          (chain)
          int64
          0 1 2 3
          array([0, 1, 2, 3])
        • draw
          (draw)
          int64
          0 1 2 3 4 ... 7996 7997 7998 7999
          array([   0,    1,    2, ..., 7997, 7998, 7999])
        • a_dim_0
          (a_dim_0)
          int64
          0 1 2 3 4 5 6 ... 25 26 27 28 29 30
          array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
                 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])
        • b_dim_0
          (b_dim_0)
          int64
          0 1 2 3 4 5 6 ... 25 26 27 28 29 30
          array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
                 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])
        • mu_a
          (chain, draw)
          float64
          0.8377 0.8338 ... 0.8388 0.8401
          array([[0.83772976, 0.8338105 , 0.8390078 , ..., 0.84147442, 0.83813332,
                  0.84135346],
                 [0.84328525, 0.84500887, 0.84017269, ..., 0.84505557, 0.83385963,
                  0.83363094],
                 [0.82986903, 0.84291231, 0.83929379, ..., 0.84004923, 0.84541786,
                  0.82835457],
                 [0.85074373, 0.83148048, 0.84528992, ..., 0.84772842, 0.83880744,
                  0.84014435]])
        • mu_b
          (chain, draw)
          float64
          -0.03049 -0.04328 ... -0.03794
          array([[-0.03049227, -0.0432816 , -0.02524011, ..., -0.03294595,
                  -0.03154453, -0.03084002],
                 [-0.0327587 , -0.03125098, -0.03233422, ..., -0.04479928,
                  -0.03119359, -0.03739772],
                 [-0.02157373, -0.02663373, -0.03568723, ..., -0.02768779,
                  -0.02509622, -0.0259065 ],
                 [-0.02555666, -0.0271517 , -0.03386417, ..., -0.02952824,
                  -0.03255884, -0.03793723]])
        • a
          (chain, draw, a_dim_0)
          float64
          0.796 0.819 0.85 ... 0.7885 0.8923
          array([[[0.79604425, 0.81895595, 0.85004174, ..., 0.80680134,
                   0.80459238, 0.89818055],
                  [0.79816787, 0.82526436, 0.86096827, ..., 0.82396164,
                   0.84306765, 0.86000057],
                  [0.80383233, 0.83641779, 0.79673049, ..., 0.78879209,
                   0.83876772, 0.87989764],
                  ...,
                  [0.77619275, 0.81954429, 0.84989465, ..., 0.80015721,
                   0.81219013, 0.88785021],
                  [0.8113336 , 0.83929339, 0.82068365, ..., 0.79572902,
                   0.80969878, 0.88912022],
                  [0.7919001 , 0.84721452, 0.88660969, ..., 0.80135515,
                   0.81627117, 0.87786662]],
          
                 [[0.76942951, 0.81882278, 0.84872567, ..., 0.79562576,
                   0.78701414, 0.89481217],
                  [0.7826533 , 0.8450505 , 0.82488672, ..., 0.80909179,
                   0.84261054, 0.88691311],
                  [0.80773519, 0.80486134, 0.82567864, ..., 0.77212776,
                   0.77954014, 0.87105462],
          ...
                  [0.77710476, 0.82694109, 0.81390805, ..., 0.824919  ,
                   0.80093013, 0.90894931],
                  [0.78878285, 0.81615348, 0.84374258, ..., 0.79547143,
                   0.83624875, 0.880254  ],
                  [0.79727432, 0.82623858, 0.82126225, ..., 0.80895559,
                   0.83361983, 0.89283442]],
          
                 [[0.80557577, 0.81534169, 0.8307627 , ..., 0.78108803,
                   0.85997185, 0.89592104],
                  [0.79993115, 0.82953326, 0.83704119, ..., 0.80204142,
                   0.76131517, 0.86453169],
                  [0.78355062, 0.81686455, 0.80345988, ..., 0.77349627,
                   0.86347271, 0.86858118],
                  ...,
                  [0.8046629 , 0.82949042, 0.8540158 , ..., 0.77618399,
                   0.77115671, 0.88714004],
                  [0.75745154, 0.81481995, 0.82831273, ..., 0.81561534,
                   0.84604071, 0.88989243],
                  [0.80890272, 0.83497248, 0.83115027, ..., 0.76409668,
                   0.78847551, 0.89229085]]])
        • b
          (chain, draw, b_dim_0)
          float64
          -0.04281 -0.03986 ... 0.0009009
          array([[[-0.04280919, -0.03986138, -0.0498043 , ..., -0.05211812,
                   -0.01713397, -0.01677846],
                  [-0.07285494, -0.00428205, -0.04557853, ..., -0.06787101,
                   -0.02109202,  0.00379252],
                  [-0.04642196, -0.03927768, -0.04275801, ..., -0.03132719,
                   -0.02491233,  0.00223774],
                  ...,
                  [-0.03035236, -0.00716208, -0.03950139, ...,  0.01206203,
                   -0.02879346, -0.01110105],
                  [-0.02786191, -0.0386128 , -0.04199093, ..., -0.06190725,
                   -0.04006184, -0.01059542],
                  [-0.06331477, -0.04986357, -0.06525713, ..., -0.05036081,
                   -0.02373299, -0.02239471]],
          
                 [[-0.04040604, -0.00694931, -0.0550391 , ...,  0.00136972,
                   -0.02841754, -0.02387198],
                  [-0.01931544, -0.04822716, -0.02934552, ..., -0.08921853,
                   -0.03318799, -0.01920177],
                  [-0.0447699 , -0.0082404 , -0.05444433, ..., -0.00594416,
                   -0.04148509, -0.00552068],
          ...
                  [-0.04655245,  0.00654832, -0.0506605 , ..., -0.00638147,
                   -0.02652183, -0.0303318 ],
                  [-0.0287951 , -0.01399599, -0.02843793, ..., -0.02010127,
                   -0.0418336 , -0.01930447],
                  [-0.03990548, -0.01506888, -0.03994181, ..., -0.023441  ,
                   -0.00246541, -0.03132055]],
          
                 [[-0.02664095, -0.03192061, -0.02550328, ..., -0.02639188,
                   -0.00606803, -0.03414775],
                  [-0.03569518, -0.03582461, -0.05456925, ..., -0.10181469,
                   -0.10326854, -0.03209002],
                  [-0.01481837, -0.0190864 , -0.01272179, ...,  0.02385523,
                    0.0260288 , -0.04625598],
                  ...,
                  [-0.04381446, -0.03191252, -0.05718656, ..., -0.07409598,
                    0.0419491 , -0.00276225],
                  [-0.01553056, -0.0100312 , -0.01968983, ..., -0.00069574,
                   -0.07171337, -0.00789586],
                  [-0.06615211, -0.04005526, -0.0153701 , ..., -0.10906542,
                    0.01615029,  0.00090094]]])
        • sigma_a
          (chain, draw)
          float64
          0.03069 0.03164 ... 0.04317 0.03777
          array([[0.03068785, 0.03163807, 0.03193719, ..., 0.03004048, 0.02652563,
                  0.02859884],
                 [0.03510044, 0.0371899 , 0.03553372, ..., 0.02877096, 0.03377557,
                  0.02747911],
                 [0.0451038 , 0.03572319, 0.02804348, ..., 0.03739085, 0.02932267,
                  0.02911442],
                 [0.03101597, 0.03603661, 0.03330381, ..., 0.0321107 , 0.0431683 ,
                  0.0377718 ]])
        • sigma_b
          (chain, draw)
          float64
          0.0398 0.03874 ... 0.03045 0.05148
          array([[0.03979907, 0.0387417 , 0.02838811, ..., 0.02123035, 0.02569104,
                  0.02246891],
                 [0.01917533, 0.02781637, 0.03618588, ..., 0.02413779, 0.03124063,
                  0.0217749 ],
                 [0.02947366, 0.0322586 , 0.03896407, ..., 0.03785309, 0.02758817,
                  0.02061987],
                 [0.01918958, 0.03840851, 0.03308845, ..., 0.03821636, 0.03045141,
                  0.05148074]])
        • sigma_y
          (chain, draw)
          float64
          0.3748 0.3743 ... 0.3745 0.3686
          array([[0.3748157 , 0.37434848, 0.37202985, ..., 0.37249541, 0.37076514,
                  0.37080015],
                 [0.37334523, 0.37134734, 0.37076486, ..., 0.36989978, 0.37357053,
                  0.36759252],
                 [0.37243794, 0.37340801, 0.36956249, ..., 0.37384   , 0.37013198,
                  0.37367709],
                 [0.37055253, 0.37275933, 0.37011333, ..., 0.36896636, 0.37448557,
                  0.36857887]])
      • created_at :
        2022-12-06T17:59:44.518942
        arviz_version :
        0.12.1
        inference_library :
        pymc
        inference_library_version :
        4.2.2
        sampling_time :
        213.41605377197266
        tuning_steps :
        1000

    • <xarray.Dataset>
      Dimensions:       (chain: 4, draw: 8000, y_like_dim_0: 22463)
      Coordinates:
        * chain         (chain) int64 0 1 2 3
        * draw          (draw) int64 0 1 2 3 4 5 6 ... 7994 7995 7996 7997 7998 7999
        * y_like_dim_0  (y_like_dim_0) int64 0 1 2 3 4 ... 22459 22460 22461 22462
      Data variables:
          y_like        (chain, draw, y_like_dim_0) float64 -2.374 -2.62 ... -0.07564
      Attributes:
          created_at:                 2022-12-06T18:00:08.481877
          arviz_version:              0.12.1
          inference_library:          pymc
          inference_library_version:  4.2.2
      xarray.Dataset
        • chain: 4
        • draw: 8000
        • y_like_dim_0: 22463
        • chain
          (chain)
          int64
          0 1 2 3
          array([0, 1, 2, 3])
        • draw
          (draw)
          int64
          0 1 2 3 4 ... 7996 7997 7998 7999
          array([   0,    1,    2, ..., 7997, 7998, 7999])
        • y_like_dim_0
          (y_like_dim_0)
          int64
          0 1 2 3 ... 22459 22460 22461 22462
          array([    0,     1,     2, ..., 22460, 22461, 22462])
        • y_like
          (chain, draw, y_like_dim_0)
          float64
          -2.374 -2.62 ... -0.0682 -0.07564
          array([[[-2.37407228e+00, -2.61994482e+00, -1.99688591e+00, ...,
                    9.97156303e-03,  9.97156303e-03, -1.11296668e-01],
                  [-2.34245614e+00, -2.66296528e+00, -1.93442594e+00, ...,
                   -8.03048617e-02, -8.03048617e-02, -5.07134308e-02],
                  [-2.49135819e+00, -2.59332548e+00, -2.11684492e+00, ...,
                   -3.41327426e-02, -3.41327426e-02, -7.88216826e-02],
                  ...,
                  [-2.43553138e+00, -2.67978049e+00, -2.31762083e+00, ...,
                   -9.41255994e-02, -9.41255994e-02, -5.82538059e-02],
                  [-2.42745954e+00, -2.52870082e+00, -2.04770705e+00, ...,
                   -3.13383001e-02, -3.13383001e-02, -7.12532632e-02],
                  [-2.33047099e+00, -2.57568963e+00, -2.18748039e+00, ...,
                   -4.06620561e-02, -4.06620561e-02, -7.61875886e-02]],
          
                 [[-2.45164000e+00, -2.66637657e+00, -2.04584701e+00, ...,
                   -6.29147599e-02, -6.29147599e-02, -6.06415103e-02],
                  [-2.32616509e+00, -2.58414007e+00, -2.16749982e+00, ...,
                   -1.65173490e-02, -1.65173490e-02, -7.79987562e-02],
                  [-2.49699536e+00, -2.62240948e+00, -1.96852395e+00, ...,
                   -2.04419411e-02, -2.04419411e-02, -7.71989387e-02],
          ...
                  [-2.14230823e+00, -2.33777685e+00, -2.18161379e+00, ...,
                   -1.17630535e-02, -1.17630535e-02, -3.42044144e-02],
                  [-2.48768865e+00, -2.77067767e+00, -1.94319765e+00, ...,
                   -4.64240936e-02, -4.64240936e-02, -6.78980844e-02],
                  [-2.20826322e+00, -2.36136257e+00, -2.16930240e+00, ...,
                   -3.83898554e-02, -3.83898554e-02, -6.22545713e-02]],
          
                 [[-2.38712534e+00, -2.68795134e+00, -2.07145236e+00, ...,
                   -9.62561067e-02, -9.62561067e-02, -9.69843560e-02],
                  [-2.54422763e+00, -2.60167279e+00, -1.72433098e+00, ...,
                   -3.12943790e-02, -3.12943790e-02, -8.52514309e-02],
                  [-2.06771519e+00, -2.53206587e+00, -1.80234325e+00, ...,
                   -8.38080085e-02, -8.38080085e-02, -7.42570864e-02],
                  ...,
                  [-2.48780665e+00, -2.64386444e+00, -2.04145628e+00, ...,
                   -4.35178764e-02, -4.35178764e-02, -7.23803546e-02],
                  [-2.11882516e+00, -2.77800369e+00, -2.10106355e+00, ...,
                   -2.38693308e-02, -2.38693308e-02, -7.26021269e-02],
                  [-2.35852468e+00, -2.32748748e+00, -2.02651129e+00, ...,
                   -6.82040360e-02, -6.82040360e-02, -7.56367600e-02]]])
      • created_at :
        2022-12-06T18:00:08.481877
        arviz_version :
        0.12.1
        inference_library :
        pymc
        inference_library_version :
        4.2.2

    • <xarray.Dataset>
      Dimensions:              (chain: 4, draw: 8000)
      Coordinates:
        * chain                (chain) int64 0 1 2 3
        * draw                 (draw) int64 0 1 2 3 4 5 ... 7995 7996 7997 7998 7999
      Data variables: (12/16)
          tree_depth           (chain, draw) int64 4 5 5 4 4 4 4 4 ... 4 4 4 4 4 4 4 4
          process_time_diff    (chain, draw) float64 0.01416 0.02832 ... 0.01868
          energy_error         (chain, draw) float64 -0.0115 -0.1693 ... 0.2123
          perf_counter_start   (chain, draw) float64 3.236e+05 3.236e+05 ... 3.238e+05
          max_energy_error     (chain, draw) float64 -0.3379 -0.3193 ... 0.2123
          smallest_eigval      (chain, draw) float64 nan nan nan nan ... nan nan nan
          ...                   ...
          perf_counter_diff    (chain, draw) float64 0.01416 0.02832 ... 0.01868
          n_steps              (chain, draw) float64 15.0 31.0 31.0 ... 15.0 15.0 15.0
          lp                   (chain, draw) float64 -9.559e+03 ... -9.566e+03
          step_size            (chain, draw) float64 0.234 0.234 ... 0.2552 0.2552
          step_size_bar        (chain, draw) float64 0.2279 0.2279 ... 0.2785 0.2785
          energy               (chain, draw) float64 9.59e+03 9.591e+03 ... 9.596e+03
      Attributes:
          created_at:                 2022-12-06T17:59:44.532085
          arviz_version:              0.12.1
          inference_library:          pymc
          inference_library_version:  4.2.2
          sampling_time:              213.41605377197266
          tuning_steps:               1000
      xarray.Dataset
        • chain: 4
        • draw: 8000
        • chain
          (chain)
          int64
          0 1 2 3
          array([0, 1, 2, 3])
        • draw
          (draw)
          int64
          0 1 2 3 4 ... 7996 7997 7998 7999
          array([   0,    1,    2, ..., 7997, 7998, 7999])
        • tree_depth
          (chain, draw)
          int64
          4 5 5 4 4 4 4 4 ... 4 4 4 4 4 4 4 4
          array([[4, 5, 5, ..., 5, 4, 4],
                 [4, 4, 4, ..., 4, 4, 4],
                 [4, 4, 4, ..., 4, 4, 4],
                 [4, 4, 4, ..., 4, 4, 4]])
        • process_time_diff
          (chain, draw)
          float64
          0.01416 0.02832 ... 0.01867 0.01868
          array([[0.014156, 0.028318, 0.028216, ..., 0.032818, 0.016267, 0.016261],
                 [0.016169, 0.015779, 0.016345, ..., 0.0127  , 0.012698, 0.012692],
                 [0.015293, 0.015464, 0.015408, ..., 0.016852, 0.016882, 0.016093],
                 [0.014157, 0.014144, 0.014167, ..., 0.018662, 0.018673, 0.018684]])
        • energy_error
          (chain, draw)
          float64
          -0.0115 -0.1693 ... -0.01487 0.2123
          array([[-0.01150387, -0.16932405, -0.15535675, ..., -0.1907942 ,
                  -0.15014058, -0.19461693],
                 [ 0.29591031,  0.15917933,  0.06408026, ...,  0.39520656,
                  -0.01190527, -0.12540702],
                 [-0.12433265, -0.4203214 ,  0.15701044, ..., -0.06919618,
                  -0.50779392,  0.21316459],
                 [ 0.17989013, -0.01578431,  0.41774077, ..., -0.12669724,
                  -0.01486732,  0.21230829]])
        • perf_counter_start
          (chain, draw)
          float64
          3.236e+05 3.236e+05 ... 3.238e+05
          array([[323641.01335275, 323641.02768696, 323641.05618704, ...,
                  323802.41918787, 323802.4521845 , 323802.46860712],
                 [323666.89497575, 323666.91136263, 323666.927339  , ...,
                  323812.51955671, 323812.53237838, 323812.5451975 ],
                 [323655.76297275, 323655.77844429, 323655.79407696, ...,
                  323799.97793075, 323799.99495446, 323800.01200625],
                 [323640.78610487, 323640.80043887, 323640.81476246, ...,
                  323782.53484508, 323782.55372879, 323782.57260646]])
        • max_energy_error
          (chain, draw)
          float64
          -0.3379 -0.3193 ... -0.2009 0.2123
          array([[-0.3378666 , -0.31926061, -0.42854383, ..., -0.23752896,
                  -0.64324071, -0.22760281],
                 [ 0.75599487,  0.33576508,  0.33337627, ...,  0.52967758,
                   0.15262856, -0.21454665],
                 [-0.26226107, -0.4203214 ,  0.28077331, ..., -0.44004833,
                  -0.50779392, -0.54148439],
                 [-0.57644373, -0.69754492,  0.60385919, ..., -0.22415017,
                  -0.20091439,  0.21230829]])
        • smallest_eigval
          (chain, draw)
          float64
          nan nan nan nan ... nan nan nan nan
          array([[nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan]])
        • largest_eigval
          (chain, draw)
          float64
          nan nan nan nan ... nan nan nan nan
          array([[nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan],
                 [nan, nan, nan, ..., nan, nan, nan]])
        • acceptance_rate
          (chain, draw)
          float64
          0.9828 1.0 0.9741 ... 1.0 0.939
          array([[0.98279503, 1.        , 0.97414955, ..., 0.97806027, 0.85279876,
                  0.96750395],
                 [0.69957575, 0.8701549 , 0.89178406, ..., 0.75650179, 0.98188143,
                  0.9490029 ],
                 [0.9823279 , 0.98566538, 0.89066998, ..., 0.93717365, 0.94608163,
                  0.91206307],
                 [0.87403805, 1.        , 0.78344625, ..., 0.95633725, 1.        ,
                  0.93899012]])
        • index_in_trajectory
          (chain, draw)
          int64
          -11 6 7 -9 11 8 ... -9 2 4 6 10 13
          array([[-11,   6,   7, ...,   5,  13,   9],
                 [ -4,   9, -14, ...,   9,  12,   8],
                 [ -7,   5,  -6, ...,  -4,   6,  -6],
                 [  6,  12, -14, ...,   6,  10,  13]])
        • diverging
          (chain, draw)
          bool
          False False False ... False False
          array([[False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False],
                 [False, False, False, ..., False, False, False]])
        • perf_counter_diff
          (chain, draw)
          float64
          0.01416 0.02832 ... 0.01867 0.01868
          array([[0.01415671, 0.02831754, 0.02821525, ..., 0.03281808, 0.01626704,
                  0.01626275],
                 [0.01616904, 0.01577538, 0.01634471, ..., 0.01269958, 0.01269758,
                  0.01269163],
                 [0.01529154, 0.015463  , 0.01540683, ..., 0.01685179, 0.01688217,
                  0.01609333],
                 [0.01415658, 0.01414433, 0.01416533, ..., 0.01866254, 0.01867229,
                  0.01868158]])
        • n_steps
          (chain, draw)
          float64
          15.0 31.0 31.0 ... 15.0 15.0 15.0
          array([[15., 31., 31., ..., 31., 15., 15.],
                 [15., 15., 15., ..., 15., 15., 15.],
                 [15., 15., 15., ..., 15., 15., 15.],
                 [15., 15., 15., ..., 15., 15., 15.]])
        • lp
          (chain, draw)
          float64
          -9.559e+03 ... -9.566e+03
          array([[-9559.07308121, -9552.17312629, -9542.83704896, ...,
                  -9537.48586683, -9537.03848122, -9533.52536332],
                 [-9550.68300064, -9560.08677105, -9562.23708605, ...,
                  -9549.76618939, -9546.23294348, -9543.66552917],
                 [-9558.62955971, -9545.62387764, -9547.72219522, ...,
                  -9560.40693634, -9542.59349567, -9547.95788024],
                 [-9553.59560102, -9559.85239338, -9570.19502517, ...,
                  -9556.96174394, -9557.60291782, -9566.23509678]])
        • step_size
          (chain, draw)
          float64
          0.234 0.234 0.234 ... 0.2552 0.2552
          array([[0.23395366, 0.23395366, 0.23395366, ..., 0.23395366, 0.23395366,
                  0.23395366],
                 [0.25548358, 0.25548358, 0.25548358, ..., 0.25548358, 0.25548358,
                  0.25548358],
                 [0.27582472, 0.27582472, 0.27582472, ..., 0.27582472, 0.27582472,
                  0.27582472],
                 [0.25521021, 0.25521021, 0.25521021, ..., 0.25521021, 0.25521021,
                  0.25521021]])
        • step_size_bar
          (chain, draw)
          float64
          0.2279 0.2279 ... 0.2785 0.2785
          array([[0.22786802, 0.22786802, 0.22786802, ..., 0.22786802, 0.22786802,
                  0.22786802],
                 [0.25292578, 0.25292578, 0.25292578, ..., 0.25292578, 0.25292578,
                  0.25292578],
                 [0.28014066, 0.28014066, 0.28014066, ..., 0.28014066, 0.28014066,
                  0.28014066],
                 [0.27849976, 0.27849976, 0.27849976, ..., 0.27849976, 0.27849976,
                  0.27849976]])
        • energy
          (chain, draw)
          float64
          9.59e+03 9.591e+03 ... 9.596e+03
          array([[9590.41867014, 9591.00419619, 9583.84958938, ..., 9577.70896375,
                  9565.687625  , 9560.62214072],
                 [9585.12222109, 9592.3204204 , 9604.12838542, ..., 9575.29531056,
                  9585.77208815, 9578.063827  ],
                 [9595.64658606, 9583.36492609, 9574.17931404, ..., 9597.45657712,
                  9589.99332701, 9572.16413174],
                 [9591.23811161, 9586.717821  , 9608.2810974 , ..., 9591.05324967,
                  9587.38530008, 9595.81938307]])
      • created_at :
        2022-12-06T17:59:44.532085
        arviz_version :
        0.12.1
        inference_library :
        pymc
        inference_library_version :
        4.2.2
        sampling_time :
        213.41605377197266
        tuning_steps :
        1000

    • <xarray.Dataset>
      Dimensions:       (y_like_dim_0: 22463)
      Coordinates:
        * y_like_dim_0  (y_like_dim_0) int64 0 1 2 3 4 ... 22459 22460 22461 22462
      Data variables:
          y_like        (y_like_dim_0) float64 0.0 0.0 0.0 1.0 0.0 ... 1.0 1.0 1.0 1.0
      Attributes:
          created_at:                 2022-12-06T18:00:08.498836
          arviz_version:              0.12.1
          inference_library:          pymc
          inference_library_version:  4.2.2
      xarray.Dataset
        • y_like_dim_0: 22463
        • y_like_dim_0
          (y_like_dim_0)
          int64
          0 1 2 3 ... 22459 22460 22461 22462
          array([    0,     1,     2, ..., 22460, 22461, 22462])
        • y_like
          (y_like_dim_0)
          float64
          0.0 0.0 0.0 1.0 ... 1.0 1.0 1.0 1.0
          array([0., 0., 0., ..., 1., 1., 1.])
      • created_at :
        2022-12-06T18:00:08.498836
        arviz_version :
        0.12.1
        inference_library :
        pymc
        inference_library_version :
        4.2.2

In [45]:
# County predictions
xvals = np.arange(2)
b = varying_intercept_slope_trace.posterior.a.mean(axis=1).mean(axis=0).values
m = varying_intercept_slope_trace.posterior.b.mean(axis=1).mean(axis=0).values
plt.figure(figsize = (8,10))
for bi,mi in zip(b,m):
    plt.plot(xvals, mi*xvals + bi, 'bo-', alpha=0.4)
    plt.xlim(-0.1, 1.1)
    plt.xlabel("Black Indicator")
    plt.ylabel("Proportion released pretrial per judicial circuit")
plt.savefig("District_outcomes_for_black_defendants.png");

A more granular look: differences among the 127 named localities¶

In [50]:
localities_cols=['Defendant_Race', 'WhetherDefendantWasReleasedPretrial','Locality_Name']
In [52]:
locality_df=pd.read_csv("~/MSDS/Pretrial Detention Capstone/October 2017 Cohort_Virginia Pretrial Data Project_Deidentified FINAL Update_10272021.csv", usecols=localities_cols)
In [65]:
locality_df=locality_df.query("((Defendant_Race=='W')|(Defendant_Race=='B'))")
locality_df=locality_df.query("Locality_Name!=' '")
locality_df=locality_df.query("WhetherDefendantWasReleasedPretrial!=9")
locality_df['Indicator_Black']=(locality_df['Defendant_Race']=='B')
In [59]:
# lookup table (dict) for each county


localities=locality_df.Locality_Name.unique()
num_local=locality_df.Locality_Name.nunique()
local_lookup=dict(zip(localities, range(num_local)))
In [62]:
locality_df['Locality_Number']=locality_df.Locality_Name.map(local_lookup)
In [64]:
localities=locality_df['Locality_Number'].values
In [66]:
with pm.Model() as localities_model:
    # Priors
    mu_a = pm.Normal('mu_a', mu=0., sigma=1e5)
    sigma_a = pm.Exponential("sigma_a", 0.5)
    mu_b = pm.Normal('mu_b', mu=0., sigma=1e5)
    sigma_b = pm.HalfCauchy('sigma_b', 5)
    
    #sigma_b = pm.Exponential("sigma_b", .05)
    # Random intercepts
    a = pm.Normal('a', mu=mu_a, sigma=sigma_a, shape=num_local)
    # Random slopes
    b = pm.Normal('b', mu=mu_b, sigma=sigma_b, shape=num_local)
    # Model error
    sigma_y = pm.Uniform('sigma_y', lower=0, upper=100)
    # Expected value
    y_hat = a[localities] + b[localities] * locality_df['Indicator_Black'].values
    # Data likelihood
    y_like = pm.Normal('y_like', mu=y_hat, sigma=sigma_y, observed=locality_df['WhetherDefendantWasReleasedPretrial'].values)
    
    
    step = pm.NUTS(target_accept=0.9)
    varying_intercept_slope_trace=pm.sample(8000, step = step, random_seed = SEED)
Multiprocess sampling (4 chains in 4 jobs)
NUTS: [mu_a, sigma_a, mu_b, sigma_b, a, b, sigma_y]
100.00% [36000/36000 04:47<00:00 Sampling 4 chains, 0 divergences]
Sampling 4 chains for 1_000 tune and 8_000 draw iterations (4_000 + 32_000 draws total) took 309 seconds.
In [67]:
pm.model_to_graphviz(localities_model)
Out[67]:
<svg width="461pt" height="355pt" viewBox="0.00 0.00 460.97 354.86" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"> <g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 350.86)"> <polygon fill="white" stroke="none" points="-4,4 -4,-350.86 456.97,-350.86 456.97,4 -4,4"/> <g id="clust1" class="cluster"> <title>cluster126</title> <path fill="none" stroke="black" d="M137.98,-129.95C137.98,-129.95 314.98,-129.95 314.98,-129.95 320.98,-129.95 326.98,-135.95 326.98,-141.95 326.98,-141.95 326.98,-231.91 326.98,-231.91 326.98,-237.91 320.98,-243.91 314.98,-243.91 314.98,-243.91 137.98,-243.91 137.98,-243.91 131.98,-243.91 125.98,-237.91 125.98,-231.91 125.98,-231.91 125.98,-141.95 125.98,-141.95 125.98,-135.95 131.98,-129.95 137.98,-129.95"/> <text text-anchor="middle" x="308.48" y="-137.75" font-family="Times,serif" font-size="14.00">126</text> </g> <g id="clust2" class="cluster"> <title>cluster22463</title> <path fill="none" stroke="black" d="M137.98,-8C137.98,-8 213.98,-8 213.98,-8 219.98,-8 225.98,-14 225.98,-20 225.98,-20 225.98,-109.95 225.98,-109.95 225.98,-115.95 219.98,-121.95 213.98,-121.95 213.98,-121.95 137.98,-121.95 137.98,-121.95 131.98,-121.95 125.98,-115.95 125.98,-109.95 125.98,-109.95 125.98,-20 125.98,-20 125.98,-14 131.98,-8 137.98,-8"/> <text text-anchor="middle" x="200.98" y="-15.8" font-family="Times,serif" font-size="14.00">22463</text> </g> <g id="node1" class="node"> <title>sigma_y</title> <ellipse fill="none" stroke="black" cx="70.98" cy="-198.43" rx="45.01" ry="37.45"/> <text text-anchor="middle" x="70.98" y="-209.73" font-family="Times,serif" font-size="14.00">sigma_y</text> <text text-anchor="middle" x="70.98" y="-194.73" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="70.98" y="-179.73" font-family="Times,serif" font-size="14.00">Uniform</text> </g> <g id="node8" class="node"> <title>y_like</title> <ellipse fill="lightgrey" stroke="black" cx="175.98" cy="-76.48" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="175.98" y="-87.78" font-family="Times,serif" font-size="14.00">y_like</text> <text text-anchor="middle" x="175.98" y="-72.78" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="175.98" y="-57.78" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge5" class="edge"> <title>sigma_y->y_like</title> <path fill="none" stroke="black" d="M93.38,-165.74C101.96,-154.17 112.07,-141.18 121.98,-129.95 127.58,-123.62 133.8,-117.13 139.96,-110.98"/> <polygon fill="black" stroke="black" points="142.45,-113.44 147.14,-103.93 137.55,-108.44 142.45,-113.44"/> </g> <g id="node2" class="node"> <title>mu_a</title> <ellipse fill="none" stroke="black" cx="276.98" cy="-309.38" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="276.98" y="-320.68" font-family="Times,serif" font-size="14.00">mu_a</text> <text text-anchor="middle" x="276.98" y="-305.68" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="276.98" y="-290.68" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="node7" class="node"> <title>a</title> <ellipse fill="none" stroke="black" cx="276.98" cy="-198.43" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="276.98" y="-209.73" font-family="Times,serif" font-size="14.00">a</text> <text text-anchor="middle" x="276.98" y="-194.73" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="276.98" y="-179.73" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge2" class="edge"> <title>mu_a->a</title> <path fill="none" stroke="black" d="M276.98,-271.8C276.98,-263.63 276.98,-254.85 276.98,-246.32"/> <polygon fill="black" stroke="black" points="280.48,-246.1 276.98,-236.1 273.48,-246.1 280.48,-246.1"/> </g> <g id="node3" class="node"> <title>sigma_b</title> <ellipse fill="none" stroke="black" cx="57.98" cy="-309.38" rx="57.97" ry="37.45"/> <text text-anchor="middle" x="57.98" y="-320.68" font-family="Times,serif" font-size="14.00">sigma_b</text> <text text-anchor="middle" x="57.98" y="-305.68" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="57.98" y="-290.68" font-family="Times,serif" font-size="14.00">HalfCauchy</text> </g> <g id="node6" class="node"> <title>b</title> <ellipse fill="none" stroke="black" cx="175.98" cy="-198.43" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="175.98" y="-209.73" font-family="Times,serif" font-size="14.00">b</text> <text text-anchor="middle" x="175.98" y="-194.73" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="175.98" y="-179.73" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge4" class="edge"> <title>sigma_b->b</title> <path fill="none" stroke="black" d="M90.54,-278.32C105.78,-264.25 123.99,-247.43 139.68,-232.95"/> <polygon fill="black" stroke="black" points="142.18,-235.41 147.15,-226.05 137.43,-230.26 142.18,-235.41"/> </g> <g id="node4" class="node"> <title>mu_b</title> <ellipse fill="none" stroke="black" cx="175.98" cy="-309.38" rx="41.94" ry="37.45"/> <text text-anchor="middle" x="175.98" y="-320.68" font-family="Times,serif" font-size="14.00">mu_b</text> <text text-anchor="middle" x="175.98" y="-305.68" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="175.98" y="-290.68" font-family="Times,serif" font-size="14.00">Normal</text> </g> <g id="edge3" class="edge"> <title>mu_b->b</title> <path fill="none" stroke="black" d="M175.98,-271.8C175.98,-263.63 175.98,-254.85 175.98,-246.32"/> <polygon fill="black" stroke="black" points="179.48,-246.1 175.98,-236.1 172.48,-246.1 179.48,-246.1"/> </g> <g id="node5" class="node"> <title>sigma_a</title> <ellipse fill="none" stroke="black" cx="394.98" cy="-309.38" rx="57.97" ry="37.45"/> <text text-anchor="middle" x="394.98" y="-320.68" font-family="Times,serif" font-size="14.00">sigma_a</text> <text text-anchor="middle" x="394.98" y="-305.68" font-family="Times,serif" font-size="14.00">~</text> <text text-anchor="middle" x="394.98" y="-290.68" font-family="Times,serif" font-size="14.00">Exponential</text> </g> <g id="edge1" class="edge"> <title>sigma_a->a</title> <path fill="none" stroke="black" d="M362.43,-278.32C347.18,-264.25 328.97,-247.43 313.29,-232.95"/> <polygon fill="black" stroke="black" points="315.53,-230.26 305.81,-226.05 310.79,-235.41 315.53,-230.26"/> </g> <g id="edge6" class="edge"> <title>b->y_like</title> <path fill="none" stroke="black" d="M175.98,-160.79C175.98,-149.38 175.98,-136.65 175.98,-124.63"/> <polygon fill="black" stroke="black" points="179.48,-124.31 175.98,-114.31 172.48,-124.31 179.48,-124.31"/> </g> <g id="edge7" class="edge"> <title>a->y_like</title> <path fill="none" stroke="black" d="M255.17,-166.2C246.7,-154.55 236.71,-141.4 226.98,-129.95 221.89,-123.96 216.28,-117.8 210.7,-111.91"/> <polygon fill="black" stroke="black" points="213.14,-109.39 203.69,-104.61 208.1,-114.24 213.14,-109.39"/> </g> </g> </svg>
In [68]:
with localities_model:
    pm.plot_trace(varying_intercept_slope_trace)
In [70]:
az.plot_forest(varying_intercept_slope_trace);
In [73]:
locality_summary=az.summary(varying_intercept_slope_trace)
In [96]:
locality_summary.loc[locality_summary['mean']< -.056].sort_values('mean')
Out[96]:
mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail r_hat
b[1] -0.112 0.033 -0.176 -0.053 0.001 0.0 2239.0 2769.0 1.0
b[13] -0.087 0.043 -0.170 -0.011 0.001 0.0 4221.0 11471.0 1.0
b[71] -0.080 0.040 -0.155 -0.009 0.001 0.0 5118.0 12015.0 1.0
b[67] -0.070 0.040 -0.150 -0.001 0.000 0.0 7767.0 14886.0 1.0
b[15] -0.063 0.026 -0.113 -0.014 0.000 0.0 11492.0 18885.0 1.0
b[80] -0.061 0.039 -0.136 0.010 0.000 0.0 11217.0 13927.0 1.0
b[61] -0.060 0.039 -0.137 0.012 0.000 0.0 12038.0 12130.0 1.0
b[92] -0.060 0.037 -0.132 0.008 0.000 0.0 11634.0 15263.0 1.0
b[35] -0.057 0.022 -0.099 -0.018 0.000 0.0 12737.0 19083.0 1.0
b[91] -0.057 0.039 -0.130 0.016 0.000 0.0 14945.0 13087.0 1.0
In [97]:
top_diff=locality_summary.loc[locality_summary['mean']< -.056].sort_values('mean').index.str.extract("(\d+)").values.astype(int)
[key for key, val in local_lookup.items() if val in top_diff]
Out[97]:
['Arlington County',
 'Greensville County',
 'Loudoun County',
 'Richmond City',
 'Lunenburg County',
 'Warren County',
 'Accomack County',
 'Bristol City',
 'Southampton County',
 'Brunswick County']
In [103]:
locality_summary.loc[locality_summary['mean']>0].sort_values('mean').index.str.extract("b\[(\d+)")
#[key for key, val in local_lookup.items() if val in pos_slope]
Out[103]:
0
0 True
1 True
2 False
3 False
4 False
... ...
127 False
128 False
129 False
130 False
131 False

132 rows × 1 columns

In [69]:
#localities predictions
xvals = np.arange(2)
b = varying_intercept_slope_trace.posterior.a.mean(axis=1).mean(axis=0).values
m = varying_intercept_slope_trace.posterior.b.mean(axis=1).mean(axis=0).values
plt.figure(figsize = (8,10))
for bi,mi in zip(b,m):
    plt.plot(xvals, mi*xvals + bi, 'bo-', alpha=0.4)
    plt.xlim(-0.1, 1.1)
    plt.xlabel("Black Indicator")
    plt.ylabel("Proportion released pretrial per locality")
plt.savefig("Localities_outcomes_for_black_defendants.png");